Structural Inferences from Massive Datasets
نویسنده
چکیده
High-level understanding of data must involve the interplay between substantial prior knowledge with geometric and statistical techniques . Our approach emphasizes the recovery of basic structural elements and their interaction patterns in order to summarize and draw inferences about the significant features contained in the data . As a testbed for modeling how scientists analyze and extract knowledge of structure morphogenesis from data, we examine the datasets obtained from numerical simulation of turbulence . We describe a program that automatically extracts 3D structures, classifies them geometrically, and analyzes their spatial and temporal coherence . Our program is constructed by mixing and matching the aggregate, classify,-and re-describe operators of the spatial aggregation language . The research is a continuation of the effort to investigate the role of imagistic reasoning in human thinking .
منابع مشابه
Enabling Scalable Data Analysis of Computational Structural Biology Datasets on Distributed Memory Systems supported by the MapReduce Paradigm
Today, petascale platforms perform large-scale simulations and generate massive amounts of data in a distributed fashion at unprecedented rates. This massive amount of data presents new challenges for the scientists analyzing the data’s scientific meaning. Specifically in case of classification and clustering of the data, traditional analysis methods require the comparison of single records wit...
متن کاملStrider-lsa: Massive RDF Stream Reasoning in the Cloud
Reasoning over semantically annotated data is an emerging trend in stream processing aiming to produce sound and complete answers to a set of continuous queries. It usually comes at the cost of finding a trade-off between data throughput and the cost of expressive inferences. Striderlsa proposes such a trade-off and combines a scalable RDF stream processing engine with an efficient reasoning sy...
متن کاملNetwork models of massive datasets
We give a brief overview of the methodology of modeling massive datasets arising in various applications as networks. This approach is often useful for extracting non-trivial information from the datasets by applying standard graph-theoretic techniques. We also point out that graphs representing datasets coming from diverse practical fields have a similar power-law structure, which indicates th...
متن کاملAdding Context to Semantic Data-Driven Paraphrasing
Recognizing lexical inferences between pairs of terms is a common task in NLP applications, which should typically be performed within a given context. Such context-sensitive inferences have to consider both term meaning in context as well as the fine-grained relation holding between the terms. Hence, to develop suitable lexical inference methods, we need datasets that are annotated with fine-g...
متن کاملRapid Processing of Synthetic Seismograms Using Windows
Currently, numerically simulated synthetic seismograms are widely used by seismologists for seismological inferences. The generation of these synthetic seismograms requires large amount of computing resources, and the maintenance of these observed seismograms requires massive storage. Traditional high-performance computing platforms is inefficient to handle these applications because rapid comp...
متن کامل